Aesthetics & Scales with Pokémon

DSST 289: Introduction to Data Science

Erik Fredner

2024-09-03

Outline

  • Homework
  • Aesthetics & scales: so what?
  • Aesthetics & scales with Pokémon
    • pokemon data
    • geom_point
    • geom_text & label
    • scale_ & limits
    • n.breaks
    • color
    • scale_color
    • size
    • shape
    • facet_ing plots

Homework

  • Share the data visualizations you found online with the people sitting near you.
  • What kind of data would you need to recreate them?

Aesthetics: so what?

Aesthetics (such as color, size, shape, etc.) determine how data points are visually distinguished in a plot.

For example:

Democrats vs. Republicans

Scales: so what?

  • Scales control how data is mapped onto visual dimensions like the x- and y-axes.
  • Proper scaling can prevent misleading representations.

pokemon data

Code
pokemon <- read_csv("../data/pokemon.csv")

# take a look at the data:
pokemon |> 
  head() |> 
  kable()
pokedex_no name form type_1 type_2 stat_total hp attack defense sp_attack sp_defense speed generation
1 Bulbasaur NA Grass Poison 318 45 49 49 65 65 45 1
2 Ivysaur NA Grass Poison 405 60 62 63 80 80 60 1
3 Venusaur NA Grass Poison 525 80 82 83 100 100 80 1
4 Charmander NA Fire NA 309 39 52 43 60 50 65 1
5 Charmeleon NA Fire NA 405 58 64 58 80 65 80 1
6 Charizard NA Fire Flying 534 78 84 78 109 85 100 1

Aesthetics & Scales with Pokémon

The highest defense and hp is in the top-right by default:

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp))

Modifying scales

Let’s suppose we wanted to flip that and see the Pokemon with the highest defense and lowest hp in the top-right corner.

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  # reverse the y-axis
  scale_y_reverse()

Combining scale_, aes, & geom_

Who has low hp and high defense?

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  # new:
  geom_text(aes(x = defense, y = hp, label = name))

Limiting scales

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  # repel the text labels:
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # limit the x-axis to `defense` of 150 or more:
  # `NA` ("Not Available") is a missing value indicator.
  # We use it here to say that there is no upper limit on the x-axis.
  scale_x_continuous(limits = c(150, NA))

Increasing n.breaks

Code
pokemon |>
  ggplot() +
  geom_point(aes(x = defense, y = hp)) +
  scale_y_reverse() +
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # make it easier to identify the precise values of `defense`:
  scale_x_continuous(limits = c(150, NA), n.breaks = 30)

Color

  • We can use color to see patterns in the data by variables
  • e.g., Are there relationships between type_1, defense, and hp?
  • We’re also going to filter for first generation Pokemon to reduce the number of points.

Color by type_1

Code
pokemon |>
  filter(generation == 1) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = type_1)) +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Custom color

Let’s use colors associated with 🔥, 🍃, and 💧 Pokemon:

Code
pokemon |>
  filter(generation == 1) |>
  filter(type_1 %in% c("Water", "Fire", "Grass")) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = type_1)) +
  geom_text_repel(aes(x = defense, y = hp, label = name)) +
  # use the `type_1` colors instead of the default:
  scale_color_manual(values = c(
    Water = "blue",
    Fire = "red",
    Grass = "green"
  ))

scale_color

Mewtwo has a high stat_total:

Code
pokemon |>
  filter(generation == 1) |>
  ggplot() +
  # color the points by `stat_total` instead of `type1`:
  geom_point(aes(x = defense, y = hp, color = stat_total)) +
  # use the `viridis` color palette instead of the default:
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

size

Magikarp has a low stat_total:

Code
pokemon |>
  filter(generation == 1) |>
  # just water pokemon
  filter(type_1 == "Water") |>
  ggplot() +
  # new: `size` by `stat_total`
  geom_point(aes(x = defense, y = hp, size = stat_total)) +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Combine size and color

Code
pokemon |>
  filter(generation == 1) |>
  # just psychic pokemon
  filter(type_1 == "Psychic") |>
  ggplot() +
  # new: `color` by `stat_total`, too
  geom_point(aes(x = defense, y = hp, size = stat_total, color = stat_total)) +
  # use the `viridis` color palette instead of the default:
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Combining color and shape

Code
pokemon |>
  # filter for first gen
  filter(generation == 1) |>
  # filter for a few types
  filter(type_1 %in% c("Normal", "Rock", "Bug", "Poison")) |>
  ggplot() +
  geom_point(aes(
    x = defense,
    y = hp,
    # new: shape points by `type_1`
    shape = type_1,
    # color points by `stat_total`
    color = stat_total
  )) +
  scale_color_viridis_c() +
  geom_text_repel(aes(x = defense, y = hp, label = name))

Bonus: facet-ing plots

Code
# faceting allows us to split a plot into multiple panels based on a factor
pokemon |>
  filter(generation == 1) |>
  filter(type_1 %in% c("Normal", "Rock", "Bug", "Poison")) |>
  ggplot() +
  geom_point(aes(x = defense, y = hp, color = stat_total)) +
  scale_color_viridis_c() +
  # new: `~` means "by", so we are saying "facet wrap by `type_1`"
  facet_wrap(~type_1) +
  # note that the scales of the plots are all the same
  # this makes them directly comparable
  geom_text_repel(aes(x = defense, y = hp, label = name))

Coming soon: facet everything

Code
pokemon |>
  # we are "pivoting" the data from wide to long format for ease of plotting
  # we will discuss this later in the course
  pivot_longer(
    cols = c(attack, sp_attack, defense, sp_defense, speed),
    names_to = "stat_category",
    values_to = "stat_value"
  ) |>
  ggplot() +
  geom_point(aes(x = stat_value, y = hp, color = stat_total)) +
  scale_color_viridis_c() +
  # new: facet by `stat_category`, a column we created with `pivot_`
  facet_wrap(~stat_category)

Summary

  • Aesthetics determine how data points are visually distinguished, including aspects like color, size, and shape.
  • Scales control how data is mapped onto visual dimensions such as x- and y-axes. Proper scaling ensures that visualizations are easy to interpret and not misleading.
  • Manipulating both aesthetics and scales can reveal patterns and/or outliers in data.
  • Preserving scales on faceted plots can make them directly comparable.